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Abstract 

Motivated by the analysis of social networks, we study a model of random networks that has 
both a given degree distribution and a tunable clustering coefficient. We consider two types of 
growth processes on these graphs: diffusion and symmetric threshold model. The diffusion process 
is inspired from epidemic models, ft is characterized by an infection probability, each neighbor 
transmitting the epidemic independently. In the symmetric threshold process, the interactions are 
still local but the propagation rule is governed by a threshold (that might vary among the different 
nodes). An interesting example of symmetric threshold process is the contagion process, which is 
inspired by a simple coordination game played on the network. Both types of processes have been 
used to model spread of new ideas, technologies, viruses or worms and results have been obtained for 
random graphs with no clustering. In this paper, we are able to analyze the impact of clustering on the 
growth processes. While clustering inhibits the diffusion process, its impact for the contagion process 
is more subtle and depends on the connectivity of the graph: in a low connectivity regime, clustering 
also inhibits the contagion, while in a high connectivity regime, clustering favors the appearance of 
global cascades but reduces their size. 

For both diffusion and symmetric threshold models, we characterize conditions under which global 
cascades are possible and compute their size explicitly, as a function of the degree distribution and 
the clustering coefficient. Our results are applied to regular or power-law graphs with exponential 
cutoff and shed new light on the impact of clustering. 

Keywords: Contagion threshold, Diffusion, Random graphs, Clustering 

1 Introduction 

Many network phenomena are well modeled as spreads of epidemics through a network. However, de- 
pending on the motivation, different mechanisms are at work and we beUeve that different models should 
be used. For example, to model the spread of worms or email viruses, and, more generally, faults, a simple 
diffusion model is often used, where each node when it becomes infected or faulty 'transmits' the infec- 
tion to her neighbors independently and with a given probability. There is now a vast literature on such 
epidemics on complex networks (see [24) for a review) . But if nodes represent agents in a social network, 
the transmission mechanism is independent of the local condition faced by the agents concerned. But to 
model the spread of innovations as a social process, individual's adoption behavior is highly correlated 
with the behavior of her contacts. In this case, there is a factor of persuasion or coordination involved 
and relative considerations tend to be important in understanding whether some new behavior or belief 
is adopted [22 . In social contexts, the spread of information and behavior often exhibits features that 
do not match well those of the diffusion model just described where an individual is influenced by each 
of her neighbors independently. In such a context, the linear threshold model, originally proposed by 
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Granovctter |12j . captures in a simple way the local correlations among individuals. We will study the 
symmetric threshold model proposed by Lelarge f20| . which generalizes both bootstrap percolation [3] 
and contagion model |22j . 

In this paper, we will analyze two different types of epidemics modeled by simple growth processes 
that we now describe. In a given network, each node can either be active or inactive. The diffusion 
process corresponds to the case where each node of the network that becomes active transmits the 
activation to her neighbors with a given probability, independently from each others. On the other 
hand, in the symmetric threshold model, a threshold is associated to each node, and the dynamics of the 
process corresponds to the case where a node of the network becomes active as soon as the number of 
her active neighbors exceeds the threshold of the node. As in the original model of [2^, thresholds are 
(possibly) random, with a distribution depending on the degree of the node, and such that thresholds 
are independent among nodes. The symmetric threshold model will allow us to analyze the contagion 
process |22| . For both models, we will first consider a case where there is only one initial active node and 
characterize conditions under which global cascades are possible, that is to say when a positive fraction of 
the population is active at the end of the process. In such cases we compute the probability of a cascade 
and its size. Then, we consider the cascade size when a positive fraction of the population is initially 
active. The initial activations are random in that case, and the probability that a node belongs to the 
seed might depend on its degree. 

We now describe the model of random graphs studied in this paper. For many real-world networks, 
the underlying graph G is a power- law graph, i.e. a graph whose degree distribution follows a power law. 
Random graphs with a given degree sequence allow to model such behavior. This model is usually called 
the configuration model [5]. There is a vast literature on the analysis of the diffusion for such graphs 
[24j . The contagion process has also been studied for such graphs through heuristics |28| or rigorously in 
[20] or [2]. Random graphs are not considered to be highly realistic models of most real- world networks, 
and they are used as first approximation as they are a natural choice for sparse interaction network 
in the absence of any known geometry. One essential drawback of this model is that these graphs are 
'locally tree- like': short cycles are very rare. However, real- world networks are often highly clustered, 
meaning that there is a large number of triangles and other short cycles [23]. For social graphs, this is 
a consequence of the fact that friendship circles are typically strongly overlapping so that many of our 
friends are also friends of each other. 

There are works in the physics and biology literature on models of random graphs with clustering 
|25j . Our model is inspired from j26j which allow to model random graphs with positive clustering 
and possibly power law degree distribution. The idea is to 'add' clustering to a standard configuration 
model by replacing some vertices by cliques. By choosing the fraction of vertices replaced, this leads 
to a graph where the amount of clustering can be tuned by adjusting the parameters of the model. 
This model generalizes the standard configuration model to incorporate clustering. Understanding how 
clustering affects diffusion and contagion remains largely an open question. Our work is a first step 
towards addressing this issue in a systematic and rigorous way. In particular, we are able to make 
a rigorous analysis of the impact of a variation of the clustering coefficient while keeping the degree 
distribution in the graph fixed. To the best of our knowledge, this sensitivity analysis is new and gives a 
number of insights on the impact of clustering. In j26j and [llj . the diffusion process on such graphs is 
analyzed by an heuristic approximation through a branching process with additional cliques. We derive 
rigorous proofs for these results. A different model of random graphs with clustering, called random 
intersection graphs has been studied rigorously in [9] and in [7], the diffusion process is studied on such 
graphs. However, the degree distribution for this kind of graphs has to be a Poisson distribution and 
clustering cannot vary independently of the degree distribution. Up to our knowledge, results on the 
contagion model have not been proved before our work for random graphs with clustering. Recently, 
[1] derives bounds which are valid for the contagion model |22| on deterministic networks. Our analysis 
in contrast gives asymptotic results as the size of the graph tends to infinity and allows us (by looking 
at a more specific model) to identify neatly the impact of clustering. 

^ A preliminary version without proofs of our work appeared in [8] . 
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The paper is organized as follows. In Section [5J we present the graph model, compute its asymptotic 
degree distribution and its asymptotic clustering coefficient. We explain how to tune the clustering 
coefficient of the model, while keeping the asymptotic degree distribution fixed. In Section |31 we derive 
the minimal value for the probability of infection in our random graph model with clustering such that a 
global diffusion is possible, and we compute the size of the diffusion in that case. We apply these results 
to random regular graphs and power-law graphs and show that clustering inhibits the diffusion process 
in that case. We also compute the cascade size for the diffusion process with degree based activation. 
In Section |4l we derive the cascade condition for the symmetric threshold model on our random graph 
model with clustering, together with the size of the cascade when it occurs. Numerical evaluations in 
the particular case of the contagion process show that the effect of clustering on the contagion threshold 
depends on the mean degree of the graph, and that clustering decreases the cascade size when it occurs. 
In addition, we finally compute the cascade size for the symmetric threshold model with a slight variant 
of the degree based activation. Proofs are given in Section [S] 

Notations. In the following, we consider asymptotics as n — )■ oo, and we denote by — >-p the convergence 
in probability as n — > oo. The abbreviation 'whp' ("with high probability") means with probability 
tending to 1 as 71 —> 00, and we use the notation Op{n), Qp{n) in a standard way (see |15| for instance): 
X = Op{n) means that, for every £ > 0, ¥{X > en) — )- as n — 00. In addition, for integers s > and 
< r < s, let bsr denote the binomial probabilities bsr{p) ■= P(Bi(s,p) = r) = (^)p''(l — p)''"'". 

2 Random graph model and its basic properties 

We first present the model for the random graph, and compute its asymptotic degree distribution and 
its asymptotic clustering coefficient (for two different definitions). 

2.1 Model of random graph with clustering 

We first consider the uniform random graph with fixed degree distribution: since this graph has asymp- 
totically no clustering, we will then modify it to obtain a graph with clustering. 

Let n e N and let d = (dl"')"^! = (di)" be a sequence of non-negative integers such that J2i is even. 
The integer n is the number of vertices in the graph and vertex i g [n] has degree di in the graph. Let 
G (n, d) be a graph chosen uniformly at random among all simple (i.e. with no multi-edges or self-loops) 
graphs with n vertices and degree sequence d (assuming such graphs exist) |5]. 

We will let n — > cx) and assume that we are given d satisfying the following regularity conditions 
which are standard in the random graph literature, see pT| : 

Condition 1. For each n, d — {di)i is a sequence of non-negative integers such that ^ • di is even. We 
assume that there exists a probability distribution p ~ (pr)5^Lo (independent of n) such that: 

(i) Ur/n = \{i : di = r}| /n ^ pr as n ^ 00, for all r > 0; 
(a) X := J^r'^Pr e (0,00); 
(m) Y.,4 = 0{n). 

If Dn is the degree of a vertex chosen uniformly at random among the n vertices of G{n,d), and 

D a random variable with distribution p, (i) is equivalent to the fact that £)„ — ^ D (convergence in 
distribution). In addition, (Hi) is equivalent to IE[£'j^] = 0(1), which implies that the random variables 
Dn are uniformly integrable or equivalently the uniform summability of J2d dnd/n, in particular E[D„] — > 
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Figure 1: Transformation 

The model of random graphs G {n, d) are 'locally tree-like', i.e. they contain very few (i.e. o(n)) short 
cycles in their structure. We now show that it is possible to generalize this model of random graphs 
to incorporate clustering in a simple way. The resulting model of random graphs will still be tractable 
for the analysis of diffusion and symmetric threshold models. To 'add' clustering in G{n,d), we replace 
some vertices by a clique of size the degree in the original graph, i.e. a vertex of degree r in the original 
graph G {n, d) is replaced by r vertices with all the r{r — l)/2 edges between them and each of them is 
connected to exactly one of the neighbors of the vertex in the original graph G (n, d) as illustrated on 
Figure [TJ Note that if r = 0, i.e. if the original node is isolated, this procedure remove the node. By 
convention a clique of size zero is empty. 

In order to be able to tune the clustering coefficient in the graph, we will not replace all vertices by a 
clique but do a probabilistic choice whether to replace a vertex or not: for all r > 0, 7^ € [0, 1] represents 
the probability that a vertex of degree r in G (n, d) is replaced by a clique of size r in the new model 
denoted G (n, d, 7), where 7 = (7r)^o ^ short notation for the sequence of 7r's. The choices to replace 
or not a vertex by a clique are made independently at each vertex of the original graph G (n, d). More 
formally, for each vertex i G {1, let X{i) be a Bernoulli random variable with parameter 7^^ (all 

Bernoulli random variables being independent of each other). We construct the random graph G (n, d, 7) 
by replacing each vertex i of G (n, d) with X(i) = 1, by a clique of size di where each vertex of the clique 
has exactly one neighbor outside the clique being a neighbor of i in the original graph. All vertices i with 
X{i) = are unchanged. In particular, if 7^ = for all r > 0, then we simply get G (n, djj) ~ G {n, d), 
whereas if 7^ = 1 for all r > 0, all vertices in G (n, d) are replaced by cliques (and isolated vertices are 
removed). With a little abuse of notation, we write G(n, d, 7) for the graph G(n, d, 7) in which the 
sequence 7 is constant and equals to 7. 



2.2 Degree distribution in G{n,d,j) 

As we will see in the next subsection, the procedure described above introduces clustering at soon as 
7r > for some r. It also modifies the degree distribution in the graph and we derive the new degree 
distribution here. Recall that each vertex of degree r can either be replaced (with probability 7,.) by 
r vertices of degree r, or stays as a single vertex of degree r (with probability 1 — 7r). The following 
proposition gives the resulting asymptotic degree distribution in G (n, d, 7). 

Proposition 2. We consider the model G (n, d, 7) for a sequence d satisfying Condition[l\with probability 
distribution p = {pr)^0' ^'^'^ clustering parameter 7 — {'^r)'^=o- For all r > 0, let hr be the number of 
vertices with degree r in G(n, d, 7), and let h — '^^fir be the total number of vertices in G {n,d,^). 
Then we have, as 71 — > 00; 

- ^ 7 ['^^'i + (1 - > 0- 

n ^-^ 

d>0 

and, for all r > 0, the proportion of vertices with degree r in G (n, d, 7) has the following limit, as n 00: 

nr_ ~ [rjr + (1 - lr)]Pr 

h ^ 7 
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Proof. Wc first show the asymptotics for the number n of vertices in G (ri, d, 7). 

Let d > 0, and let Bd be the number of vertices with degree d that are replaced by a clique. Then 
Bd follows a Binomial distribution with parameters {nd-,jd)- By Condition [TJ-i'ij, we have: rid/n — >■ pd, 
so that the Law of Large Numbers implies: Bd/n — >p ^dPd (which is still true if pd = 0). 

The number of vertices with degree d that are not replaced by a clique is Ud — Bd, so we can express 
the total number n of vertices in G (n, d, 7) the following way: 

n 1 

- = - J^dBd + ind- Bd) V [d-fd + (1 - ld)]pd = 7 
d d 

which follows from the previous limits, and the uniform summability of ^ dud/n implied by Condition 

Similarly, the total number of vertices with degree r (r > 0) in G (n, d, 7) is rBr + {n,. — Br), so the 
proportion of vertices with degree r in G {n, d, 7) is: 

rBr + ("r - Br) p H,. + {I ~ Jr)] Pr 
Z > : ^Pr, 

n 7 

which concludes the proof. □ 

In other words, if _D„ is the degree of a vertex chosen uniformly at random in G(n, d, 7), then 

Proposition [2] implies that i)„ — ^ where Z) is a random variable with distribution (pr)r>o- In the 
particular case where 7^ = 7 for all r, we have 7 = 7 A + 1 — 7 and the mean degree of the graph is then 

^ ^ 7E[D - 1] + 1 

which is a non-decreasing function of 7. 



2.3 Clustering coefRcient 

We now compute the clustering coefficient of the graph G(n, d, 7). The most common definition of the 
clustering coefRcient of a finite graph is given by: 

^ 3 X number of triangles ^ jq j^j (•j^^ 
number of connected triples ' 

In our model of random graphs where vertices are exchangeable, this definition can also be interpreted 
as the conditional probability that there is an edge between two vertices j and k, given that they have 
a common neighbor i. At the end of this subsection, we will also consider a local clustering coefficient 
defined for a vertex of degree larger than 3 and compute the associated clustering coefficient for the graph 
based on this local measure for the graph G (n, d, 7). 

Computation of the clustering coefficient. Note that the number of connected triples in G (n, d, 7) 
is simply dy(dy — l)/2. On the other hand, for any vertex u in G (n, d,7), let Py be the number of 
pairs of neighbors of v that share an edge together. More precisely, if Mv is the set of neighbors of v 
(whose cardinality is \Mv\ = dy), then Py is the number of pairs {w,w'} C Afy, w 7^ w' , such that w 
and w' are also neighbors of each other. Thus 3 times the number of triangles in G (n, d, 7) is given by 
Py. Hence following ([T|), we define the clustering coefficient G'"-' of the graph G (n, d, 7) by: 

fj{n) ^ ^ • e [0, 1]. 

Y.vdv{dy-l) 
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Proposition 3. We consider the model G {n, d,^) for a sequence d satisfying Condition\^with proba- 
bility distribution p — (pr)J^Lo' '^'^'^ clustering parameter 7 ~ (7r)^o- Then we have for the clustering 
coefficient of G (n, d, 7) ; 

E,.>2((^ - l)7r + - 

The proof is given at the end of the subsection. We will explore in more details the implications of 
Proposition [3] in Section \TM Before that we present an alternative definition of clustering. 



Another definition of clustering coefficient. The local clustering coefficient G^"'^ of a vertex v in 

(n) 

a graph quantifies how close the vertex and its neighbors are to being a clique. Cv is defined to be 
the fraction of pairs of neighbors of v that are also neighbors of each other [5^]. Using the notations 
of the previous paragraph, the local clustering coefficient of v is Cu""* = Py ■ 2/[c?„(c?,u — 1)]. Note that 
this definition only makes sense ii dy > 2, i.e. when v is not isolated, nor a leaf of the graph. The 
local clustering of vertices with degree one or zero is taken to be zero. The clustering coefficient for 

(n) 

the whole network is then defined as the average of the clustering coefficients for each vertex: 

C^"^=E.C'i"V«, where h is the number of vertices in the graph, including those of degree one or zero. 
As observed in |16| , since we take the convention that the local clustering is zero for vertices with degree 
one or zero, the clustering coefficient in the graph can be very low if the graph contains a lot of such 
vertices, even if other vertices are highly clustered. We call Cj""* the biased clustering coefficient and 
we refer to [16| for more information. In the next proposition, we give the asymptotics for the biased 
clustering coefficient in G (n, d, 7) but we will mainly deal with the definition ^ in the rest of the paper. 

Proposition 4. We consider the model G (n, d, 7) for a sequence d satisfying Condition{^with probability 
distribution p — (pr)i^Lo- Then we have for the biased clustering coefficient of G {n,d,'^): 



G^-^^C,:=Y.Vr^{r~2)., 
^ 7 



r>3 



where 7 is defined in Proposition^^ 



Proof of Propositions^^ andj^ We recall the following standard result for the random graph G {n, d): 

Lemma 5. Let C''-"-' (resp. clp"^ ) be the (resp. biased) clustering coefficient in G(n,d). Then we have: 
JL, and C*"^ ^ 0. 

We say that a vertex in G {n^ d,^) has parent i e if it belongs to a clique that replaces 

the vertex i of G{n,d) (when X{i) = 1) or if it is i (when X{i) = 0). We first consider a vertex v in 
G (n, d, 7) whose parent i is such that X{i) = 1. In this case we can directly compute the local clustering 
coefficient C^"'' . Indeed, vertex v has di — 1 neighbors inside K, that are all linked together (which gives 
(d,-iKd,-2) gjgj,g total), and one neighbor v' outside K, which is not linked to the other neighbors of 
V (if it were the case, there would be multiple edges between i and the parent j of v' , which is not the 
case in the simple graph G {n, d)). Hence 

ijk^^m^ and C(") = ^ 



2 d^{d^-l) d, ' 

provided that d, > 2. If e {0, 1}, then C^"' = 0. 

We first prove Proposition Since there are di such vertices inside a clique, the contribution of clique 
K in the total clustering Cj"'' = J2v C't"V"- equal to diGy"''^ /n = [di — 2)/n. This leads to the following: 

^cr^ ^ ^-Y^{d-2)B,+^- y: c[-^ 

n n ^-^ n ^-^ 

d>2 i:X{i)=0 
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where Bd is the number of vertices with degree d that are replaced by a clique, as in the proof of 
Proposition [5] Using that Bd/n -^p ^dPd, and that 'J2i-x(i)=o ^i"V" (as a consequence of Lemma 
U, we obtain: f C^") ^ Erf>3 - 2) 7dPd- Proposition m follows, applying Proposition [5] 
The end of the proof for Proposition [3] is similar, and follows from the fact that: 

V d 

^d^{dy-l)/n -^p ^d{d-l)[d'yd + {l-1d)]Pd- 

V d 

□ 



2.4 Tunable clustering coefficient with fixed degree distribution 

In this subsection, we show how to use our model in order to generate graphs with a given degree 
distribution and clustering. This construction will allow us to compare graphs with a given degree 
distribution but with various clustering coefficients and to see the impact of clustering on the epidemic. 
This analysis is not possible with the random intersection graphs studied in [7]. Indeed, once the clustering 
coeSicient and the mean degree in the graph is fixed in [7] , the degree distribution is completely determined 
and has to be compound Poisson. In particular, the variance (as all higher moments) of the degree 
distribution is also fixed. 

Our model has a lot more freedom in term of graphs that we can generate but still has one limitation: 
for a given degree distribution, there is a constraint on the maximal value of the clustering coefficient 
for our model. As a simple example, note that our model is not able to generate 2-regular graphs with 
positive clustering coefficient. In order to provide a graph with a given asymptotic degree distribution p 
and a positive clustering coefficient, we need the following assumptions on p: 

Condition 6. We assume that the probability distribution p satisfies: 

(^V J2r>3Pr > 0; 

(Hi) po = 0. 

Under these conditions, we have the following proposition: 

Proposition 7. Let p ~ (j>r)r>o be a probability distribution satisfying Condition We define the 
maximal clustering coefficient as 

^max _ ^ _ -4^. (2) 

Then for any value < C < C""^^, there exists a sequence d satisfying Condition [7] with probability 
distribution p ~ (pr)^o '^'^'■^ value of j € [0,1] such that the model G{n,d,j) has asymptotic degree 
distribution p and asymptotic clustering coefficient C . 

More precisely, 7 is the solution of the following equation: 

r r(.\- ^r>zr{r-l){r-i)j^^^,Pr 

''-^^^^■= j:r>_.rir-l)p. ■ 

Let F{-i') := Y.r>i (r^iw+i Pr 7' e [0, 1], and set 



Kz2,.>iPr/r ■> ' 
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Then we can define p as: 



Pr ■= 



p.[(A- 1)7 + 1] 
(r- 1)7 + 1 



for all r > 1, andpo :~ 0. 



(5) 



Proof. We first show that 7, A and p are well-defined, and that p is a probability distribution that satisfies 
Condition [1] 

The solution 7 of equation (jS]) exists and is unique. Indeed we have the following derivative for C(7): 



which is positive since X]r>3 ^- Hence the clustering coefficient C(7) is an increasing function of 7, 
taking values between and C(l) = C*™^^. Since C G [0, C"^'^^], there exists a unique 7 € [0, 1] such that 



The constant A is well-defined: if 7 < 1, then for all r > 1, {r — 1)7 + 1 > rj, which leads to 
F(7) < 1/7, i.e. 1-7^(7) > 0. 

In addition, we have that '^^.rp,. = A. Indeed, if 7 = 1, then rpr = Xpr, and summing over r > 
gives the result. If 7 7^ 1, we have that ''"Pr = ii^ ^ 1)7 + 1]^(7) = ^7 where the first equality comes 
from ([5]) and the second one from 

We can easily verify that p is a probability distribution: for all r > 0, we have that ((r — 1)7 + l)pr = 
Pr[(A — 1)7 + 1] (due to ^ and the fact that po = 0). Summing over all r > and using the fact that 
J2r'^Pr = finally gives that J2rPr ^ ^■ 

We know consider the graph G {n^d,^), with d given by the following. For each n, let d such that 
\{i : di = r}\ = [np^J for all r > 0. In addition we adjust the value of dn (for instance) such that di 
is even. Then J^i'^i ~ X^r '''^ L"'P»"J ~ 0{n) due to Condition [Bj-fzij and equation ([S]). Hence Condition[T] 
is satisfied by d. 

We can verify that G (n, d, 7) has asymptotic degree distribution p, using Proposition [2] and equation 
(O, and asymptotic clustering coefficient C, using Proposition [31 equations @ and ([5|). □ 

A similar result can be proved with the biased clustering coefficient. In that case, C{j) is replaced 

by C2(7) := J2r>3 r-^+i-Ty Pr and C^^^ := J2r>3 ^^Pr- We can sec on Figure [2] that the interval of 
reachable clustering values is larger for the first notion of clustering. This illustration uses a power law 
degree distribution with exponential cutoff for the distribution p: there exists a power r > and a cutoff 
K > such that, for all r > 1, pr = c{t, k) ■ r~'^e~''/", where c(t, k) = l/(X]s s~'^e~'^^^) is a normalizing 
constant. This cutoff k allows Condition [6] to be satisfied for any power r > 0: in all figures, we will take 
K = 50. In order to increase the mean degree of the graph in Figure [21 we decrease the power t. 

3 Diffusion threshold for random graphs with clustering 
3.1 Diffusion model 

In this section, we study a simple diffusion model depending on a single parameter tt G [0, 1]. For a given 
graph G, the dynamics of the diffusion is as follows: some set of nodes 5* starts out being active; all other 
nodes are inactive. When a node becomes active, each of her neighbors becomes active with probability 
TT independently from each other. The final state of the diffusion can also be described in term of a bond 
percolation process in the graph G. Randomly delete each edge with probability 1 — tt independently 
of all other edges. Denote by G^r the resulting graph. Then any node in S will activate all nodes in its 
connected component in G^. 




C(7) = G. 
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Figure 2: Reachable clustering values for both notions of clustering coefficient, with respect to the mean 
degree A in G {n, d, 7) (when the graph G {n, d, 7) has power law degree distribution p,. oc r'"^ e~^^^^) 



3.2 Phase transition for the diffusion with a single activation 

In this subsection we consider diffusion starting from one active node and all other nodes being inactive, 
and we derive conditions under which a single starting active node can activate a large fraction of the 
population in G = G (n, d, 7). This problem corresponds to the existence of a 'giant component' in the 
random graph obtained after bond percolation. 

In order to state our result, we first need to recall some basic results about random graphs with small 
order. For d G N, let Kd be the complete graph on d vertices denoted {!,..., d}, with d{d — l)/2 edges. 
For TT € [0, 1], we denote by Kd^n) the random graph obtained from Kd after bond percolation with 
parameter tt, i.e. each edge of Kd is kept independently of the others with probability tt, otherwise it is 
removed. 

We need to compute the probability that the component in Kdljr) containing vertex 1 has k vertices, 
denoted by /(d, k, tt). Note that /(d, d, tt) is simply the probability that Kdiir) is connected and has been 
computed in |10j . Indeed simple computations show that we have the recurrence relation 

/(d,d,7r) = l-^7^3jV(fc,fc,7r)(l-^)^(''-'^-), 

k=l ^ ' ^ 

/(d,fc,7r) = (^^3j^/(fc,fc,7r)(l-^f('^-^-), (6) 

for any k < d. 

We now define for d G N and vr G [0, 1], the random variable A^(d, 7r,7) by 

P (/C(d, TT, 7) = k) = (1 - 7d)l(d = k)+ ^df{d, k, tt), 

where / is defined in (|6]). In words, /C(d, tt,^) is equal to d with probability 1 — 7d and to the size of the 
component in Kd{TT) containing 1 with the remaining probability. 
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In addition, set for all fc > 1: 

0k ■■= Pki^- Ik) + ^j:f{d,k,TT)pa-fd, (7) 

d>k 

II := (9) 

e 

cTfe := Pk{^-lk) + ^df{d,k,TT)pd'yd- (10) 

Using the notation 7 defined in Proposition [5J we also define (omitting the dependence on p and 7): 

L{z) := ^^[l-(l-7r + ^zr] 

S ^ 

h{z) := ^s^(l -TT + TTZ)'' 

^ sup{ z e [0, 1) : /iz(l — tt + ttz) = /^(z)}. 

For a graph G = {V,E) and a parameter tt e [0,1], we denote by C^ij) the size of the largest 
component in the bond percolated graph Gtt- 

Theorem 8. Consider the random graph G = G {n, d,^) for a sequence d satisfying Condition]^ with 
probability distribution p = (pr)^O' '^'^'^ clustering parameter 7 = (7r)5^Lo- ^* random variable 
with distribution p* given by p*_i — ^ for all r > 1. We define tTc as the solution of the equation: 

ttE [1C{D* + l,7r,7) - 1] 1. 

(i) if TT > TTc, we have (with the notations above) that ^ £ (0,1). In addition the asymptotic size of the 
largest component of the percolated graph G^r obtained from G {n, d, 7) is: 

C\^)/n L{C)>0. 

(ii) if TT < TTc, we have C^{tt) = 0p{n). 

We can guess the value of the diffusion threshold ttc using a branching process approximation (Ap- 
pendix 

Note that in the particular case where 7^ = for all r, we have K.{d,TT,Q) ~ d so that we get 
TTc = E,iD{D-i)] ''^li^re D is the typical degree in the random graph and our result reduces to a standard 
result in the random graphs literature (see Theorem 3.9 in [13]). 

The case where a positive fraction of individuals belong to S (not only a single node) is discussed in 
Subsection 13.51 Now we study the effect of clustering on the diffusion with a single activation. 



3.3 Effect of clustering on the diffusion for regular graphs 

In this paragraph, we consider d- regular graphs (d > 3), so that pr — t{r ~ d) for all r > 0. In this 
case, adding cliques does not change the asymptotic degree distribution in the graph G(n,d,7), and 
Pr — l^r = d) = pr- In addition, we assume that 7r = 7 for all r > (each vertex in G (n, d) is replaced 
by a clique with probability 7). We are interested in the effect of clustering on the diffusion threshold ttc 
on the one hand, and its effect on the epidemic size on the other hand. 

We have the following result for the diffusion threshold in regular graphs: 
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Figure 3: On the left: Evolution of the diffusion threshold with respect to the clustering coefficient for 
d-regular graphs. On the right: Evolution of the epidemic size with respect to the clustering coefficient 
for d-regular graphs (with infection probability tt = 0.22). 



Proposition 9. Let d > 3. We consider the asymptotic degree distribution p ~ (lr=(i)r>o- Let < 
< (7(2) < (^max _ 2^ _ 21(1^ Pqj- each j = 1,2, let {dj,Pj,^j) be chosen according to Proposition^ 

such that = G{n, dj has asymptotic degree sequence p and asymptotic clustering coefficient C'-''. 
Let TTc"*^ be the diffusion threshold defined in Theorem\^for the random graph j — 1,2. Then we 

have: 

Ln our graph model, the diffusion threshold for a random d-regular graph increases as the clustering 
coefficient increases. 



Proof. Both notions of clustering coefficient considered in 12.31 arc the same for random rf- regular graphs, 
and we have that: C(7) = . In particular, C^^^ < C^^^ implies that 71 < 72. 

According to Theorem[8l the diffusion threshold ttc"'' , j = 1, 2, is the solution of the following equation: 

7rE[/C(d,7r,7,)-l] = 1. (11) 
Using the definition of /C, equation pil) becomes tt = F(TT,jj), where 



F 



[0,1? ^ [0,1] 



Let TT e [0,1]. Then F(7r,7i) < ^(71,72) since 71 < 72 and J2t=i^fi^,^,^) — ^- Indeed, /(d, fc,7r), 
k < d, is the probability that the connected component of a vertex inside Kfi{'n) has fc vertices in Kd{Tr). 
Hence ^/('^, ^t''^) is the mean size of that component inside Kd{Tr), so that ^/('^, fc,^) ^ d. 

Therefore the curve r2 of tt 1— >■ Fin, ^2) is above the curve Fi of tt 1— J- i^(7r,7i). Let Ai (rcsp. A2) 
be the intersection between Fi (resp. F2) and the first bisector. Both functions are continuous, thus the 
first coordinate of Ai is less than or equal to the one of A2, that is to say ttc^-* < tt'^'. □ 

More precisely. Figure [3] (on the left) shows how the diffusion threshold increases with the clustering 
coefficient, for different values of d: in other words, clustering decreases the range of tt (tt G (tTc; 1]) for 
which a single individual can turn a positive proportion of the population into infected individuals. 
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Figure 4: Evolution of the diffusion threshold with respect to the clustering coefficient in a graph with 
mean degree A, with respect to the clustering coefficient C (for a fixed power law degree distribution). 

In addition, the epidemic size also decreases with the clustering: in Figure [3] (on the right), we plot 
the ratio of the largest connected component in the percolated graph over the whole population. When 
the starting infected individual is a vertex chosen uniformly at random, this ratio also corresponds to 
the probability of explosion. Hence, as the clustering increases, it 'inhibits' the diffusion process. These 
results are in accordance with . 

These results are intuitive (for c?-regular graphs) in the sense that the removal of edges inside cliques 
can stop the diffusion inside a clique in the graph G {n, d,'f), while this phenomenon docs not occur in 
the original graph G (n, d). 

3.4 Effect of clustering on the diffusion for graphs with power law degree 
distribution 

In Figure m we consider a power law degree distribution with exponential cutoff: Pr <x r~'^e~''/''''^, with 
parameters r = 2.9, r = 2.5, t = 1.81, r = 1.3, so that the mean degree is respectively A « 1.37, 
A ~ 1.65, A sa 3.22, A sa 7.3. We plot the diffusion threshold tTc for the graph given by Proposition [7l 
when the degree distribution is p and the clustering coefficient varies from to C'^^^. We observe the 
same phenomenon as the one we proved for d-regular graphs, i.e. clustering decreases the range of tt 
(tt G (tTc, 1]) for which a single individual can turn a positive proportion of the population into infected 
individuals. 

3.5 Phase transition for the diffusion with degree based activation 

In this subsection, we allow a positive fraction of nodes to be active at the beginning of the diffusion 
process. More precisely, on a given graph G, the set S of initial active nodes is random, and each node 
of degree d in G belongs to S with some probability > 0, independently for each node. We set 
a = {ad)d>o- 

Using the notation 7 defined in Proposition O and definitions ([7]) to (jl]), we define (omitting the 
dependence on a, p and 7): 
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+ df{d, s, aaYd - TT + O^] 

7 

d>s ' 

^.s(l-7s>s[l-(l-as)(l-^ + 7rC)1/e 

S 

+ s, ^)7dpd [1 - (1 - ad)^ (1 - TT + O-^] /f, 

(i>s 

C := sup{z e [0,1) : -TT + TTz) = /i(z)}. (12) 

Theorem 10. Consider the random graph G = G {ji, d,^) for a sequence d satisfying Condition{l\with 
probability distribution p ~ {pr)'^Q7 o-nd clustering parameter 7 = (7r)5^Lo- '^''"^ given an activation 
set S drawn according to the distribution a. Then we have, for the diffusion model defined in \3.1[ if 
C = 0, or if Cz (0, 1], and further C is such that there exists e > with Xz{l — tt + ttz) < h{z) for 
z E {C, ^ e, C), then we have that the size G^{tt, a) of the active nodes at the end of the diffusion verifies: 

G\-K,a)/fi L{C). 

Heuristically, taking = for all s in the definitions of the previous theorem allows to recover the 
result of Theorem [51 



4 Symmetric threshold model for random graphs with cluster- 
ing 

4.1 Symmetric threshold model 

We now describe the symmetric threshold model on a finite graph G = {V,E), with given thresholds 
k{v), for V Cz V. The progressive dynamics of the epidemic on the finite graph G operates as follows: 
some set of nodes 5* starts out being active; all other nodes are inactive. Time operates in discrete steps 
t = 1, 2, 3, . . . . At a given time t, any inactive node v becomes active if its number of active neighbors is 
at least k{v) + 1. This in turn may cause other nodes to become active. It is easy to see that the final 
set of active nodes (after n time steps if the network is of size n) only depends on the initial set S (and 
not on the order of the activations) and can be obtained as follows: set ^ l{v E S) for all v. Then as 
long as there exists v such that X]to~t; '^w > k{v), set Yy = 1, where w ^ v means that v and w share an 
edge in G. When this algorithm finishes, the final state of node v is represented by F^: Y^ = 1 if node 
V is active and y^, = otherwise. In this paper, we do not analyze the dynamics of the epidemics and 
concentrate on the final state only. 

We allow the threshold k{v) of a node u to be a random variable with distribution depending on the 
degree of w, and such that thresholds are independent among nodes. More precisely, for each s > 0, let 
{tse)o<e<s be a probability distribution. We draw independent thresholds k{i), for i eV. Knowing that 
the degree di of node i is s, threshold fc(i) is drawn according to the conditional probability distribution 
{tsi)o<i<s- P(fc(i) = £\di = s) = tsi- We say that random thresholds k = {k{i))i^v are drawn according 
to t = {tsi)s,e- 

To simplify, we define an adaptation of the symmetric threshold model for the random graph G {n, d, 7): 
we draw random thresholds k for each vertex i in the original graph G (n, d). When a vertex i of G (n, d) 
is replaced by a clique in G (n, d, 7), we associate to each vertex inside the clique the original threshold 
k{i), so that vertices inside a clique have the same threshold (also referred to as the "threshold of the 
clique"). We still denote by k{v) the threshold of a vertex w in G (n, d, 7). 
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4.2 Phase transition for the symmetric threshold model with a single acti- 
vation 



We show that there is a phase transition for a single active node to turn a positive fraction of the 
population into active nodes. For a graph G = {V,E) and thresholds k = (fc(w)).ugy, we consider the 
largest connected component of the induced subgraph in which wc keep only vertices of threshold zero. 
We call the vertices in this component pivotal players: if only one pivotal player becomes active then 
the whole set of pivotal players will eventually become active. In particular, if the set of pivotal players 
is large {i.e. of order 8(|y|), as \V\ — !• oo), then a single pivotal player u can trigger a global cascade 
(i.e. the number of active nodes at the end of the epidemic starting from u is of order 0(|y|)). 

We consider the random graph G = G {n, d,^), and random thresholds drawn according to the 
distribution t. For a node v, we denote by C{v, t) the final number of active vertices, when the initial 
state consists of only v active and all other nodes are inactive. Informally, we say that C(w, t) is the size 
of the cascade induced by node v; if C'(v,t) = 0p(n), we say that node v can trigger a global cascade. 

Using the notation 7 defined in Proposition [2] and the binomial probabilities bsr{p) defined at the end 
of Section [1] wc set (omitting the dependence on t. p and 7): 

Liz) := ^ [-7. + (l^-7.)]p. ,^^(,_^.) 

+E^^^fi-^--E^- E MO 

s ' y e^o r>s-e 

h{z) '■= sps [tspz" + 7s(l - tsn)z\ 

S 

+^Ps{'^-is) E E ''^s'-(^)' 

s s>t^Q r>s-l 

C := sup{z e [0,1) : Az^ = /i(z)}. (13) 

Theorem 11. Consider the random graph G (n, d^-y) for a sequence d satisfying Condition]^ with prob- 
ability distribution p = iPr)^o, and clustering parameter 7 = (7r)^o- Let t be a family of probability 
distributions, and k random thresholds drawn according to t in the original graph G{n,d) (i.e. if i is a 
vertex in G (n, d) replaced by a clique in G (n, d, 7), then all vertices in the clique have the same threshold 
k(i)). We call the following condition the cascade condition: 

^ r(r - VjPrtrQ > rpr. (14) 
r r 

Let "p^") be the set of pivotal players in G {n,d,j). 
(i) If the cascade condition (|14p is satisfied, then there is a unique ^ G (0, 1) such that 

Y^dpdMi ^ e-^) = X{\ - (15) 

d 

and we have: 

where 7 is defined in Proposition^^ Moreover, for any u e we have whp 

liminf^l^ >L(C) >0 (17) 
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where ^ is defined by (|13p . // in addition C = or is such that there exists e > with < h{z) 
for z e — £, C), then we have for any u e p^"^.' 

C{u,t)/h L(C). (18) 

fi'i) If X^r ^(^ ^ l)j'r^rO < X^r ''P''' Uniformly chosen player u, we have C{u, t) — Op{n). The same 

result holds if o{n) players are chosen uniformly at random. 

When 7r = for all r > 0, Theorem [TT] corresponds to the result of [20]. When we add cliques in the 
graph, the effect on the epidemic can be described by the following lemma. 

Lemma 12. We consider a clique in G {n, d,^) where all vertices are inactive, and at least one of them 
has a neighbor outside the clique which is active. If the threshold k of the (vertices in the) clique is zero, 
then the epidemic will propagate to the whole clique. On the contrary, if k is positive, then the clique 
cannot become active, even if all neighbors outside are active. 

Indeed, if fc = 0, each vertex in the clique needs only one active neighbor to become active. If fc > 0, 
each vertex in the clique needs at least two active neighbors to become active. Yet each vertex of the 
clique has only one (active) neighbor outside, other neighbors being (inactive ones) inside the clique. 

Hence a clique with positive threshold in which all vertices are initially inactive will always stops the 
epidemic. This simple observation allows to make a comparison between the epidemic in the original 
graph G (n,d) and the epidemic in the graph G {n, d,^) with additional cliques: since cliques have a 
tendency to stop epidemic, it is also easy to see that if there is no global cascade in G {n, d), then there 
is no one in G (n, d, 7) (more details arc given in the proof, 15. 5p . 

The fact that the converse is also true is more remarkable, since the cliques with positive threshold 
stop the epidemic. In fact, those cliques will reduce the size of the cascade, but they have no impact on 
the fact that cascade is possible or not. Indeed it is shown in [50] that a global cascade is possible in 
the graph G {n, d) if and only if the set of pivotal players (in G {n, d)) is large. Note that one direction 
of this equivalence is easy, and holds for any graph (in particular, this is still true for G (n, d, 7)): if the 
set of pivotal players is large, any pivotal player which is initially active can trigger a global cascade, as 
explained at the beginning of the subsection. Let us assume now that there is a global cascade in G {n, d). 
Using the equivalence shown in [50], the set of pivotal players (in G (n, d)) is large. This implies that the 
set of pivotal players in the graph G {n, d, 7) (with additional cliques) is also large (details arc given in 
the proof ■ [575]) ■ so that there is a cascade in the graph G {n, d,-f). 

Hence there is a global cascade in G (n, d, 7) if and only if there is one in the original graph G {n, d). 
This explains why the cascade condition only depends on the original distribution p and thresh- 
old distribution t (and not on 7). Yet these two graphs {G{n,d) and G {n,d,-f)) have not the same 
asymptotic degree distribution. What is interesting now is to compare two graphs that have the same 
asymptotic degree distribution p — {pr)r>o, but different clustering coefficients. 

4.3 Effect of clustering on the contagion threshold 

We use our results to highlight the effect of clustering for the gamc-thcorctic contagion model proposed by 
Blume [3] and Morris [22]. Consider a graph G in which the nodes arc the individuals in the population 
and there is an edge («, j) if i and j can interact with each other. Each node has a choice between two 
possible actions labeled A and B. On each edge («, j), there is an incentive for i and j to have their actions 
match, which is modeled as the following coordination game parametrized by a real number q €z (0, 1): if 
i and j choose A (rcsp. B), they each receive a payoff of q (rcsp. (1 — q)); if they choose opposite actions, 
then they receive a payoff of 0. Then the total payoff of a player is the sum of the payoffs with each of 
her neighbors. If the degree of node i is di and is the number of its neighbors playing B, then the 
payoff to i from choosing A is q{di — Sf) while the payoff from choosing B is (1 — q)Sf . Hence, in a 
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Figure 5: On the left: Contagion thresholds in two graphs with the same degree distribution pr oc 
r~'^e~'"/^°. On the right: Contagion thresholds in two graphs with the same degree distribution pr = 
e-^y-^/ir -1)1. 



best-response dynamic, i should adopt B if 5*^^ > qdi and A if Sf < qdt. A number of qualitative insights 
can be derived from such a model even at this level of simplicity |18] . Specifically, consider a network 
where all nodes initially play A. If a small number of nodes are forced to adopt strategy B (the seed) 
and we apply best-response updates to other nodes in the network, then these nodes will be repeatedly 
applying the following rule: switch to B if enough of your neighbors have already adopted B. There can 
be a cascading sequence of nodes switching to B such that a network-wide equilibrium is reached in the 
limit. Note that the dynamics of the contagion process is deterministic once the seed is fixed as opposed 
to the diffusion process. The contagion process is a particular case of the symmetric threshold model. 
Indeed the threshold distribution is given by: tgg = ICL^^J = ^) foi' all < ^ < s. The cascade condition 
([T4| is satisfied if and only if the parameter q of the contagion is greater than the contagion threshold 

qc := sup < (? : ^ r{r - l)pr > ^ rpr > . (19) 

[ r<ij-i r J 

We restrict ourselves to the case where 7^ = 7 for all r > and we use Proposition [7] to construct 
two graphs with the same asymptotic degree distribution p, one with a positive clustering coefficient, the 
other with no clustering. We then compare the contagion thresholds in these two graphs. 

In Figure [5] on the left, we consider a power law degree distribution with parameter r > and 
exponential cutoff: for all r > 1, p,. oc r~'^e^'^^^^. On the one hand, we consider (in red) the graph G^{t) 
for C = C"^^^ (so that 7 = 1 and the distribution in the original graph is: pr (x r~^'^~^^^e~^^^^). On the 
other hand, we consider (in blue) the graph (r) given by Proposition [7] for C = (so that 7 = and 
the distribution in the original graph is: Pr = Pr)- In Figured] (on the left), we make the parameter r 
vary: the red (resp. blue) curve corresponds to the contagion threshold qUr) (resp. ^^(t)) of the graph 
G^(t) (resp. G"(t)), defined in Contagion thresholds are given with respect to the mean degree 

A = (that is a decreasing function of r). 

In Figure [5] on the right, we consider another form for the degree distribution p: let A > 0, and set 
Pr = e^'^y^^ / (r — 1)! for all r > 1. As before, we consider the graph G^(A) for G = — 1 and p 

is a Poisson distribution with parameter A: Pr — e^^Y /r\), and the graph G°(A) given by Proposition [7] 
for G = (7 = and pr = Pr)- In Figure [5] (on the right), we plot the contagion thresholds for these two 
graphs, with respect to the mean degree A = A -I- 1. 

Both left and right-hand sides of Figure [S] show that, when the mean degree A of the graph is low, 
the contagion threshold gj? of the graph with no clustering is greater than the threshold ql of the graph 
with positive clustering. Hence, if the parameter q of the contagion process is in the interval ]ql,q'^[, a 
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Figure 6: Evolution of the contagion threshold in a graph with mean degree A with respect to the 
clustering coefficient C (for a fixed power law degree distribution). 



global cascade is possible only in the graph with no clustering: in that case, the clustering 'inhibits' the 
contagion process. On the contrary, for high values of the mean degree, we have that < so the 
clustering increases the range of parameter q for which a global cascade is possible. 

Let us fix the parameter q G (0, 1) of the contagion (g sufficiently low): this corresponds to a horizontal 
cut in Figure The interval of mean degrees A for which a global cascade is possible moves to the right 
when the clustering increases. Hence when the parameter of the contagion is fixed, clustering favors 
contagion processes on graphs with a higher mean degree. 

Now we study more precisely what happens if we fix the mean degree in the graph (which corresponds 
to a vertical cut in Figure [5]) , and increase the clustering coefficient between and its maximal value 
(^max (^gygjj jf TffQ consider the first notion of clustering coefficient, the same phenomenon appears with 
the second notion). 

In Figure [5] on the top left corner (resp. top right corner, bottom), we consider a power law degree 
distribution with exponential cutoff: Pr oc 7'"'^e~'"/^*', with parameter r — 2.5 (resp. r = 1.81, r = 0.1). 
We plot the contagion threshold qc for the graph given by Proposition [7J when the degree distribution 
is p and the clustering coefficient varies from to C^^^. We consider three different slices of Figure [S] 
(left), and we go from the blue curve (C = 0) to the red one (C = C"^^^), progressively increasing the 
clustering coefficient. For a very low value of the mean degree (A « 1.65, top left corner of FigurelS]), the 
contagion threshold decreases with the clustering. The opposite happens when the mean degree is very 
high (A « 46, bottom). In addition, for some intermediate values of the mean degree, as for A « 3.22 (top 
right corner), low values of the clustering 'helps' the contagion process, but, as the clustering coefficient 
becomes higher, the opposite happens: it 'inhibits' more and more the contagion process. 
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Figure 7: Set of pivotal players and cascade sizes for 5 = 0.15 

We see that the impact of clustering is different for low values of the mean degree and for high values 
of the mean degree. In the low values regime for the mean degree, the contagion is more and more 
difficult, as the clustering increases. On the contrary, in the high values regime, the higher the clustering 
is, the more it 'helps' the contagion. When the value of the mean degree is exactly between these two 
cases, the effect of clustering is ambiguous: a low clustering coefficient 'helps' the contagion process, but 
a high one 'inhibits' the process. 

4.4 Effect of clustering on the cascade size for the contagion model 

We still consider the game-theoretic contagion model proposed by Morris [22] (described in the intro- 
duction), and the case where 7^ = 7 for all r > 0. In this subsection, the parameter q S (0,1) of the 
contagion process is fixed, and we want to highlight the effect of the clustering on the cascade size. 

First we compare two graphs with the same asymptotic degree distribution p, one having a positive 
clustering coefficient, the other having no clustering. In Figure [71 we plot the sizes of the cascade and 
the pivotal players set for each of these graphs. 

More precisely, in Figure [71 we fix q = 0.15. The red curves correspond to a graph with positive 
clustering, constructed as follows: we start from a Poisson distribution with parameter A for p, and 
7 = 0.2. This gives pr = o'2A+o 8 ri'^'' ^^^^ clustering coefficient C ~ ^ ^^^^^ ^ > 0. The blue curves 
correspond to a graph with the same asymptotic distribution p, but no clustering (in that case, p — p 
and 7 = 0). We make the parameter A vary, and the sizes of the cascade (solid lines) and the pivotal 
players set (dot lines) are plot with respect to the mean degree A ~ rpr in the graph. 

For each graph, we observe that there is a cascade if and only if the set of pivotal players is large, 
as explained in 14.21 In addition, the interval of mean degrees A for which a cascade is possible moves to 
the right when the clustering coefficient increases, which is consistent with our observations on Figure [5l 
Finally, we observe that the size of the cascade (when it exists) decreases with the clustering. This comes 
from the fact that cliques of degree d > q~^ {i.e. cliques with positive threshold) stop the contagion 
process (as explained in Lemma fT2|). In the extremal case when 7 = 1 (each vertex of degree d is replaced 
by a clique of size d), the cascade is exactly the set of pivotal players. When the probability 7 of replacing 
a vertex by a clique increases, the cascade triggered by a pivotal player becomes closer and closer to the 
set of pivotal players only (until it is exactly the set of pivotal players) . This observation is confirmed in 
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Figure 8: Effect of the clustering on tlie cascade size for q = 0.12, in a grapli with a fixed power law 
degree distribution of mean A. 



Figure HI 

To study more precisely the effect of clustering on the cascade size, we plot (in Figure [8]) the cascade 
size for q = 0.12, with respect to the clustering coefficient of a graph with power law degree distribution 
with exponential cutoff: Pr oc r^'^e^''/^", with parameter r = 2.5 (resp. t = 1.81, t = 1.3, t = 1), so that 
the mean degree of the graph is A w 1.65 (resp. A w 3.22, A w 7.34, A w 12.62). Note that the values of 
the mean degree A on the right-hand side of Figure [S] correspond to the case where the clustering 'helps' 
the contagion to spread, while the case A w 1.65 corresponds to the case where the clustering 'inhibits' 
the contagion, as detailed in the previous subsection. As for Figure [7l we observe that the cascade size 
decreases with the clustering coefficient, when the cascade size is positive {i.e. when a cascade is possible). 
The fact that a cascade is not possible for low values of clustering (right-hand side) comes from the fact 
that, for a fixed parameter q, the interval of A for which a cascade is possible moves to the right, as 
observed in Figures [5] and [71 



4.5 Phase transition for the symmetric threshold model with degree based 
activation 

In this subsection, we allow a positive fraction of nodes to be active at the beginning of the diffusion 
process. More precisely, on a given graph G, the set S of initial active nodes is random, and each node 
of degree d in G belongs to S with some probability > 0, independently for each node. We set 
a = iad)d>o- 

We define an adaptation of the usual degree based activation for the random graph G {n, d, 7) (so that 
the initial activation differs from the one in Subsection 13. 5p . First we draw independent random variables 
for each vertex in the original graph G (n, d). More precisely, for each vertex i (of degree di in G (n, d)), 
we draw a Bernoulli random variable a{i) with parameter ad^- When a vertex i of G (n, d) is replaced by 
a clique in G (n, d, 7). we associate to each vertex inside the clique the same activation variable a(i) (if i 
is not replaced by a clique, it keeps its own activation variable). Each vertex v in G (n, d, 7) belongs to 
the initial seed S if and only if a{v) = 1. Note that each node of degree d in G (n, d, 7) belongs to S with 
probability > (since vertices inside the clique generated by i have the same degree as i). Thus the 
only difference with the usual degree based activation is that activation variables are not independent 
inside a clique: either the whole clique belongs to the initial seed S, either no vertex in the clique belongs 
to S. 
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Using the notation 7 dcfinGd in Proposition [5] and the binomial probabihties hsr{p) defined at the end 
of Section [TJ we define (omitting the dependence on a, t, p and 7): 



s ^ 

s ^ y ^5:^0 r>s-e J 

h[z) := ^(1 - as)sps [tsQz" + 7s(l - tsa)z\ 

S 

s s>f5^0 r>s-l 

C := sup{z e [0,1) : Az^ = /i(z)}. (20) 

Theorem 13. Consider the random graph G {n, d,^) for a sequence d satisfying Condition]^ with prob- 
ability distribution p ~ {Pr)'^=Q, o-nd clustering parameter 7 = (7r)^o- t be a family of probability 
distributions, and k random thresholds drawn according to t in the original graph G{n,d) (i.e. if i is a 
vertex in G{n,d) replaced by a clique in G {n,d,^), then all vertices in the clique have the same thresh- 
old k[i)). We are given an activation set S drawn according to the distribution a (so that vertices in 
the same clique are either all active or all inactive). Then we have, for the symmetric threshold model 
defined in \4.1\ if = 0, or if C, ^ (0, 1], and further C, is such that there exists e > with \z^ < h{z) for 
z € (C — e,C); ihen we have that the size C(t,a) of the active nodes at the end of the symmetric threshold 
process verifies: 

C{t,a)/h L(C). 

HeuristicaUy, taking as = for aU s in the definitions of the previous theorem allows to recover the 
result of Theorem [TlJ When 7r = for all r > 0, we recover a result in j20) . 

If we apply this result to the case where thresholds are constant among nodes {i.e. there exists an 
integer k such that k{v) = k for each vertex v), our model corresponds to a slight modification of the usual 
bootstrap percolation. Indeed the initial activation here is not independent among nodes that belong to 
the same clique. 



5 Proofs 

In the whole section, we consider a sequence d satisfying Condition [1] with probability distribution 



5.1 Configuration Model 

In order to prove Theorem[8l it will be more convenient to work with the configuration model G* {n, d) (see 
for instance [5]): each vertex i, 1 < i < n, has di half-edges, and the random graph G* (n, d) is obtained 
by taking a uniform matching among all possible matchings of half-edges into pairs. Conditioned on this 
multigraph being simple, it is distributed as G{n,d). Condition [T] implies (in particular) that 

liminf P(G* {n,d) is simple) > (21) 

(see [H]), which allows to transfer directly results that hold in probability for G* {n,d) to the model 
G{n,d). 
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Figure 9: Transformation of a subgraph in G* (n, d, 7) (or G {n, d, 7)) by (j) 

As for the simple graph, we consider the model G* {n,d,j): we associate to each i G {!,..., n} a 
Bernoulli variable X{i) with parameter 7^^, all variables being independent. If X{i) = 1, we replace 
node z by a clique of size di in which each vertex has exactly di — 1 neighbors inside the clique, and one 
half-edge outside. Then we match half-edges as for G* (n, d). Hence G* (n, d, 7) is simple if and only if 
G* {n,d) is. So, conditioned on G* {n,d,^) being simple, it is distributed as (5(71-, d, 7), and equation 
([21]) implies that 

liminfP(G'* {n,d,j) is simple) > 0. 

Therefore, we can prove Theorems 151 and [TT] for one of the models G* {n,d,'y) or G {n,d,'y), and it will 
imply that they are true for both. 

5.2 Link between the graph G{n,d,^) and the original graph G{n,d) 

Let G be distributed as G (n,d,'y). We say that a vertex in G(ri, d, 7) has parent i E {l,...,n} if it 
belongs to a clique that replaces the vertex i of G {n, d) (when X{i) ~ 1) or if it is i (when X{i) = 0). 
For any subgraph H C G, we obtain the graph (j){H) by identifying in H the vertices that have the same 
parent and that are connected in H. For instance, Figure |9] represents a clique of size 4 in G that comes 
from the replacement of a vertex i: thus all the vertices in the clique have the same parent i. In the 
subgraph H, some of the edges of the clique are not present (those in dots): the clique is split into two 
connected components. In the corresponding graph (f>{H), we merge the vertices of the clique that are 
connected together. 

We use the same definition of (f>{H) when H isa subgraph of a random graph distributed as G* (n, d, 7). 

For any graph G, set iy{G) for the number of vertices in G. The next lemma will be useful in several 
proofs. 

Lemma 14. Let G be distributed as G(n, d, 7). Let H be a subgraph of (f>{G) such that ^{H) = Op{n). 
Let H be the maximal subgraph of G such that (t>{H) = H. Then we have: I'iH) = Op{fi). 

Proof. We can bound v{H)/n using Cauchy-Schwarz inequality: 

v{H)ln < ^rVr{H)ln 




Yet Condition [T]-CMij implies that df/n~ 0{1), and by hypothesis v{H)/n -^.p 0, so v{H)/n -^p 0, 
and v{H)/n — >-p due to Proposition [5] □ 
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5.3 Proof of Theorem M 



A heuristic (using a branching process approximation) is given in Appendix [Xj We first give the idea of 
the proof. 

Let G be distributed as G* (n, d, 7) and tt S [0,1]. In the percolated graph Gtt, the removal of 
some edges inside a clique can split the clique into several connected components. In order to study the 
percolated graph G-^ (as described in Section [3]), we proceed in three steps: 

Step 1. We consider only the edges that arc inside a clique, and we delete independently each of them with 
probability 1 — tt. The graph we obtain is called g')^\ We are interested in the graph G" = 4>{G^t^^). 
If we condition G" on its number of vertices n' and its degree sequence d', we have that G' is 
distributed as G* (n',d'^. The first step consists in computing the asymptotic distribution of the 
degree sequence d . 

Step 2. Then we delete independently with probability 1 — tt each edge of G', and we apply results of [13] 
in order to study the components' sizes in the percolated graph G^. 

Step 3. We deduce the components' sizes in G^ from the previous step, using the fact that 4>{Gtt) is 
distributed as G'^. 

In the following, when we consider the model G*{n,d,^), we take the multiplicity of edges into 
account when we compute the degree of a vertex. More precisely, we say that a vertex in G* (n, d) or 
G* (n, d, 7) has 'degree' d if it has d (simple) half-edges. For instance, each loop of a given vertex has 
contribution 2 in its degree. 

Step 1. For d > 1, let V^'"-* be the set of vertices i in G (n, d) with degree d and such that X{i) = 1: 
i is replaced by a clique K{i) of size d in G. Let K{i,Ti) be the subgraph of K{i) obtained after a bond 
percolation with parameter tt. We consider the subgraph Fd{'n) C Gjr that contains the percolated version 
of the cliques with initial size d: 



where /(d, fc, tt) is given by ([6]). 

Proof. For each vertex i in V^^\ we label the vertices of K(i,Tr) from 1 to d. We look at all the vertices 
with label 1, and we let M^")((i, fc, tt) be the number of such vertices whose connected component in 
K{i,n) has size k. Using the Law of Large Numbers and the fact that Ivj""*!/??. — )-p Pdld, we have that 
M^'^\d, fc, 7r)/ri -^p f{d, k, TT)pdjd, where f{d, k, tt) is by definition the probability that the component of 
1 contains k vertices. So the total number of vertices in Fd{'K) that belongs to a component of size k is: 
df{d, k,Tr)pdjdn + Op{n) and, in order to have the number of such components, we have to divide by fc, 
which proves the lemma. □ 

Let gI^'' be the graph obtained from G* (n, d, 7) when wc replace each vertex i such that X{i) = 1 
by the percolated clique A"(i,7r). For any fc > 0, let n'^, be the number of vertices with 'degree' k in 
the projected graph G' = (/)(Gi^''). In order to compute n'f., wc have to consider the vertices i such that 




Lemma 15. For any d > I and k < d, we have that: 




d 
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X{i) = (there arc Uk — |V'j, | such ones, where Uk is the number of vertices with 'degree' k in G* (n, d)), 
and the vertices that come from a chque of initial size d, for some d > k (each such vertex corresponds 
to a component of size k in Fd{n), so there are -/V(")((i, k, tt) such ones). This gives the fohowing relation, 
for aU fc > 0: 

ni=n,-\vt^\+Y,N^-\d,k,n), 

d>k 

So Lemma [T5] gives the following asymptotic distribution for the degree sequence d': 

Lemma 16. Let n' := nj^, be the total number of vertices in G' . Then the proportion of vertices with 
degree k in G' has the following limit, as n go: 

< P , Qk 

where Qk := Pfc(l - 7fe) + J2d>k ifi^, k,TT)pdjd 

In addition, the uniform summability of kn^/n implies the uniform summability of kn'f^/n' , so that 

Step 2. We apply Theorem 3.9 in [T3] to the random graph G'. Indeed, we can assume without loss 
of generality that the previous convergences fLemma ll6[) hold a.s., and not just in probability (as in |13| : 
using the Skorohod coupling theorem, see |17j for instance, or arguing by selecting suitable subsequences) . 
Then there is a giant component in the percolated graph if and only if 

7r^d(d- l)p'd > ^dp'd, 

d d 

which is equivalent to the fact that ttE \JC{D* + 1, tt, 7) — 1] > 1. 

Step 3. The proof of (ii) follows easily from the previous step, and Lemma [Ml 

We give the main Hnes of the proof of (i). Assume tt > tTc, which corresponds to T^^j^d{d — l)p^ > 
J2d'^P'd- Let Ci be the largest connected component in = '/'(G^) and Ci be the connected component 
of Gtt such that <f>{Ci) = Ci. 

Wc first compute the limit of i'{Ci)/n as ?7 — > 00. Let g be the generating function 

9{x) Y^p'kx'' ^-^Qkx'', 

k ^ k 

and recall that its mean is called = J2k^(^k/ Q- Results in [13] show that the number VriCi) of vertices 
with degree r in Ci satisfies: Vr(Ci)/n' — >p Tlit>r^ir(,\/T^)p'i(X ~ i'^)^ where n' is the total number of 
vertices in the graph G', and ^ is the unique ^ € (0, 1) such that 

g'{l - 7ri/2 + ^1/2^) ^ ^,(1 _ ^-1/2 + ^-1/2^). (22) 

Since h{z) = (1 — tt + ttz) • g'(l — tt + ttz), we have that ^ is the solution of if and only if C 
1 — 7r~^/^ + TT^^/^^ is the solution of ^C(l — tt + ttC) = h{C,). Note that we used the second notation in the 
statement of Theorem [H in order to be consistent with the notations of Theorem [10] (in fact, we could 
have used either results in [13] or Theorem 11 of [20] for the current proof). 

Unfortunately, we cannot deduce directly the size of Ci using only the asymptotic for VriCi) / n' , r > 0. 
We have to be more precise: let t'°(Ci) (resp. vliCi)) be the number of vertices i with degree r in Ci 
such that X{i) = (resp. X{i) = 1). Then we have: 

v^ACi)/n' -^p ^6fr(V^)p,(l-7^)(l-r)/f, 

e>r 

vl{C^)ln! ^6fr(V^)^^/(d,£,7r)p,7rf(l-r)/e- 

i>r d>e 
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In summations, d represents the degree of vertices in the initial graph G, £ the degree of vertices in Gjr 
(after the percolation inside cliques), and r the degree of vertices after the percolation on external edges. 
In order to recover t^(Ci), we have to multiply each term in f^(Ci) by £, and then sum over all r, which 
gives (exchanging summations on r and £): 

HC,)/n' ^ i^a,(l-(l-7rV2 + ^i/2^)fc). 
^ fc>i 

Using that n' /n — )-p g and h/n — >p 7, we obtain that i'{Ci)/n — >p i(C)- 

Let Ci be the largest component in Gtt- Adding cliques changes the sizes of the connected components: 
hence we have to prove that Ci = Ci whp. Let C be any other component of H different from Ci. Its 
projection C — 4>{C) is different from Ci, so Theorem 3.9 in [T3] implies that v{C)/n —>-p 0. Using Lemma 
[Uwith H = C shows that v{C)/n — J-p 0. Hence Ci is the largest connected component of G whp, which 
ends the proof. 

5.4 Proof of Theorem [TOl 

The difference with the previous proof is the following: instead of using Theorem 3.9 of [13] in steps 2 
and 3, we use Theorem 10 of PO] . 

Indeed the first step is the same: the graph G' = 4){G^-P) (where ci^'' is the graph obtained from 
G (n, d, 7) after a bond percolation on the edges inside cliques only) has asymptotic degree distribution 
P' = bfe)fc' P'k = Bk/Q- 

We apply (a slight extension of) Theorem 10 in [20] for the graph G' (with tsi = ll{f=o})- Let v'^ be 
the number of vertices i such that X{i) and that satisfy: the degree of i in G' (that is to say before 
the bond percolation in G') is s, and i is active at the end of the process. Let be the number of 
vertices i such that X{i) — 1 and that satisfy: the degree of i in the original graph (piG) is d, the degree 
of i in G' is s < d, and i is active at the end of the process. The probability that such a node i (with 
degree d in (/)(G) and s in G') does not belong to the original seed 5 is (1 — a^)'' (and initial activations 
are independent among nodes). Hence we have: 

v'lln' ->p p,(l-7,,)[l-(l-a,)(l-7r + 0^]/f?, 

^dsh' -^P ^/(d,s,7r)7rfPrf[l-(l-ad)"(l-7r + 7rC)"]/ei, 

where C is given by ([T^. In order to obtain C''{tt, a), we have to multiply fjj^^ by s, and sum over all d 
and s, which gives: 

C'in,a)/n' ->p ^p.(l - 7.) [1 - (1 - a.)(l - ^ + O^] 

S 

d>s 

and ends the proof (since n' /n — >p g and h/n — >p 7). 

5.5 Proof of Theorem 1111 

As before, we say that a vertex in G (n, d, 7) has parent i € {1, n} if it belongs to a clique that replaces 
the vertex i of G (n, d) (when X{i) = 1) or if it is i (when X{i) = 0). For any graph G and any vertex 
V of G, let D{v,t) be the subgraph of G induced by the final set of active vertices, when v is the only 
vertex in the initial seed. With the notations of I4.2[ the number of vertices in D{v,t) is C{v,t). When 
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H is a subgraph of G, wc set D{H,t) for the subgraph induced by the final set of active vertices in G, 
when the initial vertices in the seed arc those of H. 

We can make a comparison between an epidemic starting from a vertex u in G (n^d,^), and the 
epidemic that would have been generated by its parent i in G (n, d) (recall that thresholds are drawn 
such that a vertex u has the same threshold as its parent i). 

Proposition 17. Let u be a vertex of G = G (n, d, 7). Let i be its parent, and K be the clique generated 
by i if X{i) = 1 (otherwise, set K = {u} ). Then we can bound the epidemic generated by u the following 
way: 

(j) {D{u, t)) C {D{K, t)) C D{i, t). 

The proof follows from Lemma [12] in 14.21 In addition, we have the following lemma, that is a conse- 
quence of PO] : 

Lemma 18. Assume '^j,r{r — l)prtro < '^j.fPr- Let u be a vertex chosen uniformly at random among 
the vertices of G ~ G (n, d, 7) , and let i be the parent of u. Then the size of the epidemic generated by i 
in 4>{G) is C{i,t) = Op{n). 

Proof. We cannot use directly the result of [201 ^^at says that, if J^r ^('^ ~ l)prtrO < J2r ^Pr a-iid i is 
chosen uniformly at random in G (n, d), then G(i, t) ~ Op{n). The idea is to apply Theorem 10 |20| . with 
a parameter a = (ad)dLo that satisfies: ad = (d'jd + 1 — 7^)0; for all d, a being a positive constant. Then 
the same arguments as for the proof of Theorem 11- (ii) [5D] work. □ 

This allows to deduce easily the case (ii) of Theorem [TTl assume X^r ''(^ ~ i)PrtrO < J2r^Pr- Then 
combining Proposition [IT] and Lemma [18] gives that the number of vertices in (j){D{u,t)) is Op{n) if u is 
chosen uniformly at random in G. Applying Lemma 1141 with LI = D{u,t) concludes the proof of case 
(n). 

We now assume that the cascade condition is satisfied. 

The proof of ([TB)) is a consequence of a result from [5D]. Indeed let G be distributed as G(n,d, 7). 
Let H (rcsp. H) be the subgraph of G (resp. 4>{G)) induced by the vertices of threshold zero. Note 
that 4>{L[) = H. We use Theorem 11 in [SD] for the graph (j>{G) (with parameter tt = 1): it gives 
the components' sizes in H. Let Ci (resp. Ci) be the largest connected component in H (resp. H). 
The number Vr{Ci) of vertices with degree r in Ci is computed in the proof of Theorem 11 in [20] : 
i'r{Ci)/n -^p prtro{l ~ ^^), whcrc ^ is defined in ([T5]) . Hence we can deduce the size of the connected 
component Ci in LI such that 0(Ci) = Ci: y{Ci)/h -^p J2d i^'^d. + (1 — 7d)]Pdtdo{^ ^ C^)/!- The way to 
show that Ci = Ci whp is similar to the end of Theorem [5] This ends the proof of ([15]) . 

The idea for the rest of the proof is to make a coupling between the epidemic on G (with threshold 
distribution t), and an epidemic on (/)(G) with a different threshold distribution, that we call t' ~ (i(,f)s c 

Proposition 19. Assume the epidemic on G — G {n, d, 7) starts from a vertex u that has threshold zero, 
and let i be the parent of u in 4>{G). We consider the following distribution of thresholds (is£)o<£<s for 
each s > 0; 

• *sO = ^so; 

• t'^^ = (1 - ls)tsi for allO<e< s; 

• 4 = (l-7sKs.+7s(l-i,so)- 

Then there exist random thresholds (fc'(j))i<j<n with the distribution t' = {t'gi)o<e<s defined above such 
that 

<j>{D{u,t)) = D{i,t'), 
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where D{i,t') is the subgraph induced by the final set of active vertices in the symmetric threshold model 
starting from i in (j){G), with threshold distribution t' — (t^^)^ ^■ 

Proof. Note that each vertex i of 4i{G) has two thresholds: k{i) (that we used to define the epidemic on 
G), and the new threshold k'[i), that we will define to use to make a comparison with the epidemic on 
G. Until the end of the proof, if no precision is given, we refer to k'{i) as the threshold of i. 

Wc explicit the natural coupling between the symmetric threshold model with parameter t in G and 
the one with parameter t' in (j){G). If u belongs to a clique, then the whole clique becomes active at the 
next step, so we can start the epidemic in (/)(G) ~ G (n, d) from the parent i of u. Let v be the neighbor 
of u outside the clique, and let j be the parent of If the threshold k{v) of vertex in G is zero, then v 
(and its whole clique if it has one) becomes active (see Lemma [T2|) . In this case, we choose k'{j) := for 
the threshold of j in (f>{G), so that it becomes also active in (j>{G). If k{v) > 0, then there are two cases: 

• If X{j) ~ 1, vertex v and its clique stay inactive (Lemma [12)) . In this case, we choose fc'(j) := s 
for the threshold of j (so that it stays inactive). 

• If X{j) = 0, vertex v becomes active if and only if it has at least k{v) + 1 active neighbors. So we 
set fc'(j) := k{v) = k{j) for the threshold of j. 

Since the random variables X{j), for j in 4'{G), are independent, the thresholds we associate to each 
node are also independent. In addition, we can easily verify that the conditional probability distribution 
of thresholds (knowing that the degree of the node is s) is given by (isf)Q<£<s- In fact, the epidemic we 

consider in (/)(G) is almost the same as the one with parameter t, except that we randomly put some 
nodes j (those such that X(j) = 1) to a threshold so high that they stay inactive. □ 

More precisely, let Gsi{u,t) (resp. G'gf {i,t')) be the final number of active vertices with degree s > 
and threshold £ at the end of the symmetric threshold epidemic on G (resp. 0(G)), with threshold 
parameter t (resp. t'), when the only vertex in the initial seed is u (resp. i). Then, using the coupling 
described above, we have the following result, for each degree s > 0: 

• Gso{u,t) = C'gQ{i,t')[sYs + (1 — Ys)], where Ys is the proportion of vertices j in (j){G) such that 
X{j) = 1, among those that have degree s and that belong to the cascade triggered by i. 

• For all £ ^ 0, we have that Ggf (w, t) = G'^^{i, t'), since the vertices of positive threshold that belong 
to the cascade triggered by u are exactly those that are not replaced by a clique. 

We have that Yg/n —>-p 7^ for all s, and the limit for C[.g{i, t') is given by the following lemma (which 
is a slight extension of Theorem 1 1 [5D] ) : 

Lemma 20. Assume (using the notations of Theorem 1 1 1\) that ^ or ( is such that there exists £ > 
with Xz^ < h{z) for z e (C — e, C). Then, for any i that belongs to the set of pivotal players in 0(G), we 
have: 

y ■r>s-£ 

In particular, for £ — 0, we have: G'^Q{i, t')/n — >-p Pst'^Q (1 — C'*)- 

Proof. By slight extension of Theorem 11 [20], the number of inactive nodes with original degree s, degree 
r in the graph of inactive nodes and threshold £ tends to ^i>s-r-iPs'^'se^sr{C)bs-r,i{0) = Pst'se^sr{C)^{^ ^ 
s — £}. Hence summing over r gives that the number of inactive nodes with original degree s and threshold 
£ tends to Pstg(J2r>s-e^'^r(0: which ends the proof. □ 
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Wc assume that C = or C is such that there exists e > with Az^ < h{z) for z E — £, C)- Let u 
be a vertex in G whose parent i belongs to the set of pivotal players in (j>{G). Let Cs{u,t) be the final 
number of active vertices with degree s > at the end of the symmetric threshold epidemic on G, with 
threshold parameter t, when the only vertex in the initial seed is u. Then we have: 

^ i^^^i^4^^tyi-n + ^Eofi- E Mc) 

^ e^O \ r>s-£ 

Using the definition of t' , we have that t^Q = t^Q and: 

Y.^'si E ^-(0 - E(l-^-)^^^ E bsriO+lsil-tso), 
e^O r>s-l £^0 r>s-£ 

which finally gives: 

^ ^ [^^^±^^-^..0(1 - C) + f 1 - - E E bsriO 

Then, by an argument similar as the one at the end of Theorem |8] or equation (|16p . we have that u 
belongs to the set of pivotal players in G, which ends the proof. 



5.6 Proof of Theorem 



We use the same idea as in the previous proof. The same statement as for Proposition [19] holds when the 
epidemic starts from a set (instead of a single vertex u). Indeed, let S be the initial seed in (j>{G). By 
definition, the initial seed S in G consists of the vertices whose parent belongs to S. 

Let Cs£{S,t) (resp. C'g^{S,t')) be the final number of active vertices with degree s > and threshold 
^ at the end of the symmetric threshold epidemic on G (resp. 0(G)), with threshold parameter t (resp. 
t\ defined in Proposition [T9| . when the initial seed is S (resp. S). 

Using a slight extension of Theorem 10 in [20], we have, for all s > and £ > 0: 



C'st{S,t')ln 



Pst'sl 



1- E ^-(0 



where ( is defined in (PH)) . More precisely, the first term pst'^^ag comes from the vertices that belong 

to the initial seed S, and the second one Pst'^g{l — as) (^1 — X)r>s-£ ^sr(C)) comes from those that are 

activated during the process. In order to obtain the asymptotic for Csi{S,t)/n, we have to multiply the 
first term by (57^ + 1 — 7^). The multiplicative constant for the second term depends on the value of the 
threshold I: \i I ~ 0, we multiply the second term by (57^ + 1 — 7s), and if £ > 0, we multiply it by 1 
(since the vertices with positive threshold that are activated during the process necessarily do not belong 
to a clique). Summing over s, I and replacing t' by its expression gives the following limit, as n — > 00: 



G(t,a)/n 



Eps*so(s7s + 1 - 7s) Qfs + (1 - as){l - C) 

S 

+ E^''(^ " ls)as{s-is + 1 - 7s)(l - tso) 



-Eps(1 -7s)(l -"s 

S 

-Eps7s(1 - iso)as(7s + 1 - 7s) 



(i-tso)-E*«^ E ^-(^) 

t^O r>s-i 
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Gathering some terms and using that n/n — 5-p 7 ends the proof of Theorem 1131 



6 Conclusions 

Up to out knowledge, our analysis is the first systematic study of random graphs with both a tunable 
asymptotic degree distribution and a clustering coefficient. Our model has the advantage to still be 
tractable for the analysis of diffusion or symmetric threshold model. 

For both models, we are able to derive explicit formulas for the cascade condition, i.e. the condition 
under which a single infected individual can turn a positive fraction of the population into infected 
individuals. When such a cascade is possible, the expression of its size is given analytically. In the 
case of random regular graphs, we proved that the clustering 'inhibits' the diffusion process. Numerical 
evaluations also show that clustering decreases the cascade size of the diffusion process for regular graphs, 
and 'inhibits' the diffusion process for power-law graphs. The impact of clustering on the symmetric 
threshold model is studied in the particular case of the contagion model described in 14.31 numerical 
evaluations show that the effect of clustering on the contagion process depends on the value of the mean 
degree in the graph: while clustering 'inhibits' the contagion for a low mean degree, the contrary happens 
in the high values regime. When a cascade is possible, we observe that clustering decreases its size. 

In addition, we can also compute explicitly the cascade size in the case of a degree based activation, 
for both diffusion and symmetric threshold models. This theoretical analysis paves the way to a possible 
control of such epidemic processes as done in [6] or [19] . 



A Branching process approximation for the diffusion threshold 
(with a single activation) 

We can guess the value of the diffusion threshold ttc given in Theorem [8] using a branching process 
approximation. Indeed the random graph G (n, d) can be approximated by a branching process T in 
which each node (except the root) has a number of offspring distributed as D* . We add cliques in this 
branching process as in G (n, d, 7), which gives a graph Gr. We then proceed in two steps: first we delete 
independently with probability 1 — tt (in Gr) each "internal" edge, i.e. edge inside a clique; second we 
delete independently with probability 1 — tt (in the new graph) each "external" edge, i.e. edge outside 
cliques. 



Before percolation inside cliques 





After percolation inside cliques 




O O 




After the first deletion of edges, we get a new graph Gp in which original cliques can be broken into 
several components. If we make the equivalent transformation in the original branching process F, it 
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means that a node can lose some of its children. 



More precisely, we consider a node i in the branching process F. Let e be the edge of i that links i 
to the previous generation. The degree of i is distributed as D* + 1. We assume D* + 1 = rf. In Gr, 
node i is replaced by a clique K with probability 7^. In that case, let v be the vertex in K whose edge 
outside the clique is e. After having deleted independently each edge inside the clique with probability 
1 — TT, the probability that the component of vertex v inside K contains k vertices (including v itself) is 
given by f{d, fc,7r). Hence the probability that v is linked to k vertices (including the one linked by e) 
is: (1 — 7£i)l(c? ~ k) + jdf{d, k,Tr) = P {K.{D* + 1, tt, 7) = fc). The new distribution of offspring in the 
corresponding branching process V is thus JC{D* + 1,77,7) ~ 1- Finally, we remove each (external) edge 
with probability 1 — tt, which gives ttE [IC{D* + 1, tt, 7) — 1] for the expected number of offspring. 
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